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Abstract 

Protein folds are built primarily from the packing together of two types of 
structures: a-helices and /3-sheets. Neither structure is rigid, and the flex- 
ibility of helices and sheets is often important in determining the final fold 
{e.g., coiled coils and /3-barrels). Recent work has quantified the fiexibility 
of a-helices using a principal-component analysis (PCA) of database helical 
structures (Ember ly, 2003). Here, we extend the analysis to /3-sheet fiexi- 
bility using PCA on a database of /3-sheet structures. For sheets of varying 
dimension and geometry, we find two dominant modes of fiexibility: twist 
and bend. The distributions of amplitudes for these modes are found to be 
Gaussian and independent, suggesting that the PCA twist and bend modes 
can be identified as the soft elastic normal modes of sheets. We consider the 
scaling of mode eigenvalues with sheet size and find that parallel /3-sheets are 
more rigid than anti-parallel sheets over the entire range studied. Lastly, we 
discuss the application of our PCA results to modeling and design of /3-sheet 
proteins. 

I. INTRODUCTION 

Most protein folds can be viewed as compact packings of a fixed set of secondary- 
structural elements^'^: a-helices and /3-sheets. It can be reasoned that the formation of 
these elements greatly simplifies the folding free-energy landscape by reducing the number 
of degrees of freedom. As a first approximation, helices and sheets can be considered as rigid 
objects, possessing only six degrees of freedom each (three translations and three rotations). 
However, most helices and sheets display some amount of bending in a protein's final fold. 
Understanding to what extent these elements are flexible, and which are their dominant 
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degrees of freedom, will help to further our understanding of how proteins fold, and even 
how they function. 

A number of studies have extracted the flexible motions of biological molecules using 
normal-mode and principal-component analysis (PCA).^~^^ A recent PCA analysis of ol- 
hehces from a structural database revealed three "soft" modes: two degenerate bend modes 
and a twist mode.^^ For all but the longest helices, these three modes were sufficient to 
describe the deformations observed in real structures. 

Arguably, a quantitative understanding of flexibility is more important for /3-sheets than 
for a-helices. In natural structures, sheets display a variety of highly distorted and bent 
shapes, e.g. /3-barrels and twisted sheets, while helices are generally much less distorted. 
What are the dominant collective motions of /9-sheets and how do they depend on the sheet's 
size? Also how does sheet geometry affect the flexibility of a sheet? The "geometry" of a 
/3-sheet is the amino-terminal to carboxyl-terminal orientation of the various strands making 
up the sheet. Most sheets fall into one of two geometries - parallel, where all the strands 
are oriented in the same direction, or anti-parallel, where strands alternate direction. The 
geometry dictates the hydrogen-bonding pattern within the sheet and hence plays a role in 
determining the sheet's flexibility. 

Here, we report a principal-component analysis of the flexibility of parallel and anti- 
parallel /9-sheets from the Protein Data Bank. The sheets considered range in size from 3 to 
6 strands, with 3 to 6 residues per strand. For both parallel and anti-parallel sheets, we find 
two dominant modes of flexibility: twisting about an in-plane axis that is perpendicular to 
the strand orientation, and bending of this same axis. The distributions of amplitudes for 
these two modes are independent Gaussians. Thus, the principal-component modes can be 
interpreted as dynamical normal modes of an elastic object. Motivated by this interpretation, 
we consider the scaling of mode eigenvalues (variances of amplitudes) with sheet size, and 
compare to predictions of a simple elastic model. For all sizes considered, parallel sheets are 
more rigid than anti-parallel sheets. 

Recently, /3-shcct structures have been characterized in detail by Ho and Curmi.^^. This 
database study focused on average properties of sheets, including twist, shear, and hydro- 
gen bonding. In contrast, our PCA analysis of /9-sheets provides a characterization of the 
flexibility of sheets about their average structures. Possible applications of the results re- 
ported here on sheet flexibility include improved parameterization of force-field models and 
inclusion of sheet elastic energies in /3-protein design. 

II. RESULTS 

A. Principal-component analysis of database /3-sheets 

The first step in analyzing the flexibility of /^-sheets was to obtain a representative set of 
structures. Following the procedure described in Methods, we were able to create a database 
of 3516 representative /3-sheets of different geometries. In Fig. 1 we show the distribution of 
geometries (parallel, anti-parallel, etc.) for sheets consisting of at least 6 strands. For sheets 
that have more than 6 strands, we consider all the 6 stranded sub-sheets (e.g. an 8 stranded 
sheet would contain three 6 strand sheets). The strand directions are labeled 'u' for up and 
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'd' for down, as if the sheet were lying fiat on the page. By convention, the first strand is 
always oriented in the up direction, i.e. with the amino-terminal at the bottom of the page 
and the carboxyl-terminal at the top of the page. Since each of the two outermost strands 
of a sheet can be equally well regarded as the first strand, the distribution of geometries in 
Fig. 1 includes every sheet twice. We find that the distribution of geometries is highly non- 
uniform, with the most frequent types being parallel (uuuuuu) and anti-parallel (ududud). 
Because other geometries occur so infrequently, we focus the rest of the analysis on parallel 
and anti-parallel sheets. 

Sets of defect- free /3-sheets of a given class were extracted from the 3516 representative 
/3-sheets. Sheets are of the same class if they have the same size, S strands each of length 
L, geometry, and pleatedness or corrugation (see Methods). To quantify flexibility, we per- 
formed a structural principal-component analysis (PCA) for each class of sheets, to identify 
the dominant collective fluctuations around the mean structure. To implement the PCA, 
we first computed the mean structure for each class of sheets via an iterative procedure. 
Starting with a randomly chosen sheet of the desired class, we aligned to it all other sheets 
of the same class. To align two sheets, we minimized the coordinate root mean square (crms) 
distance between their corresponding Ca atoms. A mean structure was then obtained by 
averaging the position of each atom over all the aligned structures. This procedure was 
iterated, each time using the new mean structure as the basis for alignment, until the mean 
structure converged to within 10~^ A/residue. An example of a subset of structures aligned 
to the converged mean structure is shown in Fig. 2. For anti-parallel sheets we find that the 
average structure conforms to the sheared structure discussed in Ho and Curmi^^. They also 
showed that parallel sheets are less sheared and we find this to be the case for our average 
parallel sheets. 

The second step in the principal-component analysis was to compute the structural 
covariancc matrix for each class of sheet. A covariance matrix measures the correlation 
of the variation from the mean for each pair of coordinates. In our case, there were SSL 
coordinates - 3 spatial directions for each of S* x L atoms. Consequently, the covariance 
matrix was a SSL x SSL matrix, with elements i,j defined as 



where is the number of sheets in the given class, Xmi is the i coordinate of the m 
structure, and (xi) is the i^^ coordinate of the mean structure. 

To complete the principal-component analysis, we computed the eigenvalues {A^} and 
eigenvectors {vg} of the covariance matrix for each class of sheets (available as Supplementary 
Material). The largest eigenvalues and corresponding eigenvectors represent the directions 
for which the data has the largest variance. These directions are the "soft" modes of the 
sheets, i.e. those collective deformations that appear with largest amplitude in the data set. 
Figure 3 shows the top 10 eigenvalues for anti-parallel sheets of size S — A and L = 5, as 
shown in Fig. 2. Each eigenvalue is given in units of and measures the variance of the 
distribution for a particular mode. Two dominant eigenvalues are evident in Fig. 3. The 
first mode is primarily a twist of the sheet about the in-plane axis perpendicular to the 
strand orientation (Fig. 4(a)). The second mode is primarily a bend of the sheet along the 
same axis (Fig. 4(b)). 




m=l 



(1) 
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For all the classes of sheets considered, we found two dominant soft modes. The eigen- 
values {i.e. variances) of these modes are shown for different sheet sizes and geometries in 
Figs. 5 and 6. Figure 5 shows the scaling of the eigenvalues with the number of strands S 
for sheets of fixed strand length. The bend-mode eigenvalues increase approximately as S'^, 
while the twist-mode eigenvalues increase more slowly with S (fits to for the anti-parallel 
bend modes arc shown). As a result of this difference in scaling, the eigenvalues for the 
twist and bend modes cross with increasing number of strands. For five or more strands, 
the eigenvalue for the bend mode becomes the larger, implying greater deformations of the 
sheet by bending compared to twisting. Figure 6 shows the scaling of the eigenvalues with 
the strand length L for sheets with a fixed number of strands. The eigenvalues generally 
increase with strand length, scaling roughly as L or L^. The scaling behavior expected for 
pure bend and twist modes is discussed in the next section. 

A feature that emerges from the scaling graphs is that the twist-mode and bend-mode 
eigenvalues are almost always larger for anti-parallel sheets than for parallel sheets. Since 
these two modes dominate deviations from the mean structure, this implies that total de- 
formations are typically larger for anti-parallel sheets than for parallel sheets. 

Next, we consider the actual distributions of amplitudes for the dominant twist and 
bend modes. The displacement of a given sheet from the mean structure, 6x = x — (x), 
can be expanded in terms of the PCA eigenvectors as Sx = J2qC'q^q. The amplitude 
is given by the projection of the displacement vector Sx onto mode q. Figures 7(a and b) 
show the distributions of projections onto twist and bend for the 1454 anti-parallel sheets 
with S* = 4 and L = 5 (cf. Fig. 2). The two distributions can be fit well by Gaussians, 
with the variances of the Gaussians, 9.6985 for twist and 4.3655 A^ for bend, close to 
the exact variances given by the mode eigenvalues, 7.7876 A^ for bend and 4.0044 for 
twist. By construction, all PCA modes are uncorrelated to lowest order, i.e. (a^Og/) — 0, 
for q 7^ q'. To look for possible higher-order correlations, we made a scatter plot of the 
amplitudes for the two dominant modes, as shown in Fig. 7(c). The distributions of points 
is roughly ellipsoidal, indicating that there arc no strong higher-order correlations between 
modes. Similar independent Gaussian distributions were obtained for all classes of sheets 
for which PCA analysis was performed. 

The Gaussian distributions of mode amplitudes and the lack of higher-order correlations 
between modes suggest that the PCA modes can be interpreted as the dynamical modes of 
a /9-sheet. This is consistent with previous results for a-helices showing the near identity 
between PCA modes obtained from static structures and the elastic normal modes of a model 
helix.^^ For small amplitude motions, elastic normal modes at equilibrium have independent 
Gaussian distributions P{ag) determined by Boltzmann weights 

Piaq) - e-^^'^^)/'^^^ - e-'.al/2ksT^ (2) 

where E{aq) — Cqag/2 is the deformation energy as a function of the mode amplitude a^, and 
c,j is the spring constant for the mode. The "soft" modes have the smallest spring constants, 
and therefore the broadest spread of amplitudes. 
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B. Scaling of the PCA modes 

Guided by the interpretation of the PCA modes as the elastic normal modes of a sheet, 
we consider the scaling with sheet size of the mode eigenvalues. Let us first consider the 
dominant bend mode, as shown in Fig. 4(b). For a uniform bend of the in- plane axis 
perpendicular to the strand orientation, the displacement of the strand at position x along 
this axis goes as 5z ~ x^/i?, where R is the radius of curvature. The bending eigenvalue is 
given by Abend = d^^" '^^P) where v is the normalized eigenvector. It follows that 

Abend ~ -t"S'^(-^), (3) 

where L is the strand length and S is the number of strands. At thermal equilibrium, each 
normal mode has UbT /2 of potential energy. For the bend mode, this energy would be put 
into curvature of the axis: 

\ksT = l^LS{^), (4) 
where k is the bend stiffness per unit length, indicating 

Abend ~ -^S'^. (5) 
K 

Thus the eigenvalue for a pure bend mode would scale as S'^ and would be independent of L. 
In Fig. 5, the predicted S"^ scaling of the bend-mode eigenvalue is seen for both parallel and 
anti-parallel /5-shects. In Fig. 6, however, we observe a significant increase of the eigenvalue 
with L, which is not predicted. The reason for the increase of the bcnd-modc eigenvalue 
with strand length L is likely to be the significant bending of individual strands associated 
with this mode, which would contribute a term to Abend that scales as L^. 

For the twist mode, as shown in Fig. 4(a), we assume that each strand rotates by an angle 
66 with respect to its neighboring strand. This corresponds to a uniform twist of the sheet 
about the in-plane axis perpendicular to the strand orientation. The displacement of a Cq, 
atom on the strand at x along this axis, and at a distance / from this axis, is 6z ~ l{x/d)69, 
where d is the distance between neighboring strands. It is straightforward to show that 
eigenvalue for the twist mode goes as 

Atwist ~ {se^)L's^ (6) 

At thermodynamic equilibrium, using ksT = c{66^)LS, where c is a twist stiffness per unit 
length, we find Atwist ~ L'^S^. As shown in Fig. 6, the twist-mode eigenvalue does scale as 
as predicted, at least for sheets of up to 5" = 5 strands. As seen in Fig. 5, however, the 
scaling of the twist-mode eigenvalue with strand number is much weaker than the predicted 
5"^, for all strand lengths. 



III. DISCUSSION 

Our principal-component analysis of /3-sheets indicates that sheets in proteins are de- 
formed primarily in two ways, by twisting and bending. The amplitudes of these modes 
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were found to have independent Gaussian distributions, suggesting that twist and bend are 
the "soft" elastic modes of sheets. The interpretation of the PCA twist and bend modes 
as elastic modes of sheets is consistent with previous PCA results on a-helices. For helices, 
the dominant modes found by PCA were shown to be indistinguishable from the soft elastic 
modes of a model helix.^^ 

In this light, the generally larger twist and bend deformations found for anti-parallel 
sheets compared to parallel sheets suggests that anti-parallel sheets have softer elastic spring 
constants. This could arise from boundary effects imposed by the differences in connectivity. 
Anti-parallel sheets tend to be connected by short loops which allows the sheet to be easily 
bent. In contrast, parallel sheets have more complex connectivity (long loops) which could 
impinge on their ability to deform. The stiffness may also arise due to slight differences in 
the hydrogen bonding pattern between parallel and sheared anti-parallel sheets^^. It will 
be interesting to see if physical modeling can capture the apparent differences in elasticity 
between the two types of sheets. 

While the PCA modes of all /3-sheets studied had the characteristics of elastic normal 
modes {i.e., independent Gaussian distributions of amplitudes) significant deviations from 
simple scaling with sheet size were observed. These deviations can probably be attributed 
to two main causes. First, the actual soft modes are not the pure twist and bend modes of 
rigid strands as assumed in the scaling analysis. Significant deformations do occur within 
individual strands - the bend mode also contains bending along the strands which would 
contribute to the scaling of its eigenvalue with strand length L. Second, /^-sheets, as quasi- 
two-dimensional objects, are subject to strong boundary effects. For example, the constraint 
of global connectivity of the sheets and the need to form backbone hydrogen bonds to the 
outer strands are likely to introduce size-dependent effects beyond the simple scaling analysis. 
In contrast, the nearly ideal scaling of the dominant modes of a-helices probably reflects the 
much weaker effect of boundaries on quasi-one- dimensional objects.^''^ 

Our PCA results provide a set of scaling behaviors and structural properties that can 
be used to test and refine energetic models of /9-sheets. We have found that a simple 
spring model, similar to one that was able to capture the normal modes of helices^^, did not 
describe well the scahng properties of sheets. This suggests that the energetics governing 
/9-sheet fiexibility is more complex than in helices. More detailed force fields are required 
and these will help to further clarify the microscopic interactions governing /3-sheet structure 
and fiexibility. 

Another possible application of the PCA results is to protein design. A recent design 
scheme focussed on the packing of a fixed set of structural elements to explore the space of 
potential novel folds^^. In that work, only rigid helices were considered for the structural 
building blocks. Because of the greater fiexibility of sheets compared to helices, extending 
the packing scheme to sheets will require including energies of elastic deformation. The 
soft modes of sheets can be easily incorporated into the design of low-energy backbone 
structures using the elastic energies E{ag) = Cqa^/2. Since we found only two dominant 
modes for sheets, elasticity can be added to the packing scheme with only two extra degrees 
of freedom per sheet. The soft modes could also be incorporated into models that analyze 
how a specific protein's structure is changed when the sequence is mutated or redesigned. 
Inclusion of soft modes may prove particularly useful in redesign of binding sites, which is 
currently limited to the rigid-backbone approximation^^. 
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In summary, this work has used database protein structures to reveal the flexible motions 
of /5-sheets. The effective spring constants and eigenvectors for the sheets studied, as well 
as the mean structure coordinates, are available as Supplementary Material. 

IV. METHODS 

We compiled a set of /3-sheet structures from 2860 representative protein folds in the 
FSSP database^^. The representative structures in the FSSP are the structurally distinct 
folds that result from doing an all-against-all structure clustering of protein folds that have 
less than 25% sequence identity. Thiis the set is designed to minimize fold redundancy. 
Making use of the structural annotations in the PDB files for the representative set, we ex- 
tracted the /3-sheets from each fold. Only sheets within a certain size range were considered: 
sheets had to have at least 3 strands and no more than 25, with between 3 and 15 residues 
per strand. Using these criteria we were able to extract 3516 representative /3-sheets. 

Many of the sheets in this representative set were found to have defects. A defect occurs 
when there is a gap in the hydrogen-bonding pattern, or the strands are of different lengths 
and one overhangs the other. These two types of defects are schematically illustrated in 
Fig. 8. We wished to eliminate sheets with defects before performing a principal-component 
analysis of ffexibility. To systematically identify defects, we developed a procedure for finding 
the optimal pairing of Cq, atoms on adjacent strands. A defect is indicated when a Cq atom 
is left unpaired. To find the optimal pairing, we computed a distance matrix dij for each 
pair of neighboring strands, 

di,j^\fi-fj\ (7) 

where fi is the position of the ith Cq, atom on the first strand and fj is the position of the jth 
Ca atom on the second strand. The two strands could be of different lengths. We defined 
the optimal pairing i ^ j between strands to be the one which minimized 

L> = 6liVgap, -I- J2 (8) 

pairsij 

where the last term represents a penalty of -|-6A for each gap. A gap occurs when a Cq, 
atom on one strand is not paired. In practice, we found that a gap penalty of 6A identified 
gaps in good agreement with a visual inspection of the sheets. To efficiently find the optimal 
pairing, i.e. the one which minimizes D, we employed the Needleman-Wunsch method^^ for 
global sequence alignment. We applied the alignment procedure for each neighboring pair 
of strands in the sheet (strands in the sheet tend to be annotated in the order they occur in 
the sheet, hence the need to only ahgn neighbors). This gave us the optimal ahgnment of 
all the strands in the sheet, and allowed us to identify sheet defects. 

After performing the pairwise alignment of all the strands making up a sheet we then 
extracted sheets of a fixed size, e.g. S strands each of length L. This is illustrated in 
Fig. 8(a). We scanned the ahgned sheet using a window of the specified dimension. If 
the window spanned a region that did not contain a gap, the positions of the Cq atoms 
were recorded to a file. For each extracted sheet, we also recorded the sheet geometry, i.e. 
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the relative amino-terminal to carboxyl-terminal directions of all the strands in the sheet. 
The direction of the first strand was defined to be up to simplify the recording of sheet 
geometries. The choice of which of the two outermost strands to call the first strand was 
arbitrary, and this was compensated for later. Another quantity recorded for each sheet 
was its "pleatedness" . A /3-sheet is corrugated - it has alternating ridges and valleys. If 
the first row of atoms from the strands making up a sheet comprise a "ridge" we consider 
the sheet to have positive pleatedness. If the first row of atoms comprise a "valley" we 
consider the sheet to have negative pleatedness. The two types of pleatedness are shown 
in Fig. 8(b). We consider sheets of the same size, geometry, and pleatedness to constitute 
a "class". Only sheets of the same class are ahgned for subsequent principal-component 
analysis (see Results). 

We note that each extracted ungapped sheet in fact generates two sheets because either 
of the two outermost strands could be called the "first" . Having arbitrarily chosen a first 
strand and called its direction up, there are two possible symmetry operations to obtain the 
second sheet. If the last strand is also up, we flip the sheet (this causes the pleatedness to 
change sign). If on the other hand the last strand is oriented down, we perform a clockwise 
rotation of the sheet by 180°. These symmetry operations effectively double our database 
of sheet structures. 
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FIGURES 



FIG. 1. Distribution of geometries for /3-sheets containing at least 6 strands from the set of 
3516 representative structures. The strand orientations are labeled with "u" for up and "d" for 
down, according to amino-terminal to carboxyl-terminal direction. We have fixed the orientation 
of the first strand to always be up. 

FIG. 2. Demonstration set of 10 anti-parallel sheets, of 4 strands each with 5 residues per 
strand and the same pleatedness (corrugation), all aligned to the average sheet for this class. 

FIG. 3. The ten largest eigenvalues from the principal-component analysis of 1454 anti-parallel 
4x5 /3-sheets, of the class shown in Fig. 2 

FIG. 4. (a) Exaggerated twist mode of a 6 stranded /3-sheet with 5 residues per strand, (b) Ex- 
aggerated bend mode of the same sheet. Average structures are shown in gray, deformed structures 
are shown in red. 

FIG. 5. Scaling of the eigenvalues for the bend and twist modes as a function of the number of 
strands for sheets of fixed strand length. The scalings are shown for both parallel and anti-parallel 
sheets. 

FIG. 6. Scaling of the eigenvalues for the bend and twist modes as a function of strand length 
for sheets with fixed number of strands. The scalings are shown for both parallel and anti-parallel 
sheets. 

FIG. 7. (a,b) Distribution of the projections of sheet displacement onto the twist and bend 
modes for 1454 anti-parallel /3-sheets each with 4 strands of 5 residues (cf. Fig. 2). Gaussian fits 
to the distributions are shown by the red curves, (c) Projections onto the subspace spanned by 
the twist and bend modes for the same 1454 structures. 

FIG. 8. (a) Schematic of the alignment procedure used to extract /9-sheets of a fixed dimension. 
On the left is sheet that might exist within the database. Backbone Cq atoms are aligned (dashed 
lines) using a global alignment procedure that allows for gaps. For example, a gap occurs in the 
bottom strand at the second Cq atom from the left. All ungapped sheets of given dimension 
(S strands each of length L) are then extracted, (b) The two types of pleatedness. If the first 
(leftmost) row of atoms forms a ridge, we consider the sheet to have positive pleatedness. If the 
first row of atoms forms a valley we consider the sheet to have negative pleatedness. 
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